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Abstract 

The  COVID-19  pandemic  caused  by  the  novel  coronavirus  SARS-CoV-2  has  led  to  over  910,000  deaths 
worldwide  and  unprecedented  decimation  of  the  global  economy.  Despite  its  tremendous  impact,  the 
origin  of  SARS-CoV-2  has  remained  mysterious  and  controversial.  The  natural  origin  theory,  although 
widely  accepted,  lacks  substantial  support.  The  alternative  theory  that  the  virus  may  have  come  from  a 
research  laboratory  is,  however,  strictly  censored  on  peer-reviewed  scientific  journals.  Nonetheless, 
SARS-CoV-2  shows  biological  characteristics  that  are  inconsistent  with  a  naturally  occurring,  zoonotic 
virus.  In  this  report,  we  describe  the  genomic,  structural,  medical,  and  literature  evidence,  which,  when 
considered  together,  strongly  contradicts  the  natural  origin  theory.  The  evidence  shows  that  SARS-CoV- 
2  should  be  a  laboratory  product  created  by  using  bat  coronaviruses  ZC45  and/or  ZXC21  as  a  template 
and/or  backbone.  Building  upon  the  evidence,  we  further  postulate  a  synthetic  route  for  SARS-CoV-2, 
demonstrating  that  the  laboratory-creation  of  this  coronavirus  is  convenient  and  can  be  accomplished  in 
approximately  six  months.  Our  work  emphasizes  the  need  for  an  independent  investigation  into  the 
relevant  research  laboratories.  It  also  argues  for  a  critical  look  into  certain  recently  published  data,  which, 
albeit  problematic,  was  used  to  support  and  claim  a  natural  origin  of  SARS-CoV-2.  From  a  public  health 
perspective,  these  actions  are  necessary  as  knowledge  of  the  origin  of  SARS-CoV-2  and  of  how  the  virus 
entered  the  human  population  are  of  pivotal  importance  in  the  fundamental  control  of  the  COVID-19 
pandemic  as  well  as  in  preventing  similar,  future  pandemics. 
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Introduction 

COVID-19  has  caused  a  world-wide  pandemic,  the  scale  and  severity  of  which  are  unprecedented. 
Despite  the  tremendous  efforts  taken  by  the  global  community,  management  and  control  of  this  pandemic 
remains  difficult  and  challenging. 

As  a  coronavirus,  SARS-CoV-2  differs  significantly  from  other  respiratory  and/or  zoonotic  viruses:  it 
attacks  multiple  organs;  it  is  capable  of  undergoing  a  long  period  of  asymptomatic  infection;  it  is  highly 
transmissible  and  significantly  lethal  in  high-risk  populations;  it  is  well-adapted  to  humans  since  the  very 
start  of  its  emergence1;  it  is  highly  efficient  in  binding  the  human  ACE2  receptor  (hACE2),  the  affinity  of 
which  is  greater  than  that  associated  with  the  ACE2  of  any  other  potential  host2’3. 

The  origin  of  SARS-CoV-2  is  still  the  subject  of  much  debate.  A  widely  cited  Nature  Medicine 
publication  has  claimed  that  SARS-CoV-2  most  likely  came  from  nature4.  However,  the  article  and  its 
central  conclusion  are  now  being  challenged  by  scientists  from  all  over  the  world5"15.  In  addition,  authors 
of  this  Nature  Medicine  article  show  signs  of  conflict  of  interests16,17,  raising  further  concerns  on  the 
credibility  of  this  publication. 

The  existing  scientific  publications  supporting  a  natural  origin  theory  rely  heavily  on  a  single  piece  of 
evidence  -  a  previously  discovered  bat  coronavirus  named  RaTG13,  which  shares  a  96%  nucleotide 
sequence  identity  with  SARS-CoV-218.  However,  the  existence  of  RaTG13  in  nature  and  the  truthfulness 
of  its  reported  sequence  are  being  widely  questioned6'9,19"21.  It  is  noteworthy  that  scientific  journals  have 
clearly  censored  any  dissenting  opinions  that  suggest  a  non-natural  origin  of  SARS-CoV-28,22.  Because  of 
this  censorship,  articles  questioning  either  the  natural  origin  of  SARS-CoV-2  or  the  actual  existence  of 
RaTG13,  although  of  high  quality  scientifically,  can  only  exist  as  preprints5"9,19'21  or  other  non-peer- 
reviewed  articles  published  on  various  online  platforms10"13,23.  Nonetheless,  analyses  of  these  reports  have 
repeatedly  pointed  to  severe  problems  and  a  probable  fraud  associated  with  the  reporting  of  RaTG136,8,9,19' 
21.  Therefore,  the  theory  that  fabricated  scientific  data  has  been  published  to  mislead  the  world’s  efforts 
in  tracing  the  origin  of  SARS-CoV-2  has  become  substantially  convincing  and  is  interlocked  with  the 
notion  that  SARS-CoV-2  is  of  a  non-natural  origin. 

Consistent  with  this  notion,  genomic,  structural,  and  literature  evidence  also  suggest  a  non-natural 
origin  of  SARS-CoV-2.  In  addition,  abundant  literature  indicates  that  gain-of-function  research  has  long 
advanced  to  the  stage  where  viral  genomes  can  be  precisely  engineered  and  manipulated  to  enable  the 
creation  of  novel  coronaviruses  possessing  unique  properties.  In  this  report,  we  present  such  evidence  and 
the  associated  analyses.  Part  1  of  the  report  describes  the  genomic  and  structural  features  of  SARS-CoV- 
2,  the  presence  of  which  could  be  consistent  with  the  theory  that  the  virus  is  a  product  of  laboratory 
modification  beyond  what  could  be  afforded  by  simple  serial  viral  passage.  Part  2  of  the  report  describes 
a  highly  probable  pathway  for  the  laboratory  creation  of  SARS-CoV-2,  key  steps  of  which  are  supported 
by  evidence  present  in  the  viral  genome.  Importantly,  part  2  should  be  viewed  as  a  demonstration  of  how 
SARS-CoV-2  could  be  conveniently  created  in  a  laboratory  in  a  short  period  of  time  using  available 
materials  and  well-documented  techniques.  This  report  is  produced  by  a  team  of  experienced  scientists 
using  our  combined  expertise  in  virology,  molecular  biology,  structural  biology,  computational  biology, 
vaccine  development,  and  medicine. 
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1.  Has  SARS-CoV-2  been  subjected  to  in  vitro  manipulation? 

We  present  three  lines  of  evidence  to  support  our  contention  that  laboratory  manipulation  is  part  of  the 
history  of  SARS-CoV-2: 

i.  The  genomic  sequence  of  SARS-CoV-2  is  suspiciously  similar  to  that  of  a  bat  coronavirus 
discovered  by  military  laboratories  in  the  Third  Military  Medical  University  (Chongqing,  China) 
and  the  Research  Institute  for  Medicine  of  Nanjing  Command  (Nanjing,  China). 

ii.  The  receptor-binding  motif  (RBM)  within  the  Spike  protein  of  SARS-CoV-2,  which  determines 
the  host  specificity  of  the  virus,  resembles  that  of  SARS-CoV  from  the  2003  epidemic  in  a 
suspicious  manner.  Genomic  evidence  suggests  that  the  RBM  has  been  genetically  manipulated. 

iii.  SARS-CoV-2  contains  a  unique  furin-cleavage  site  in  its  Spike  protein,  which  is  known  to  greatly 
enhance  viral  infectivity  and  cell  tropism.  Yet,  this  cleavage  site  is  completely  absent  in  this 
particular  class  of  coronaviruses  found  in  nature.  In  addition,  rare  codons  associated  with  this 
additional  sequence  suggest  the  strong  possibility  that  this  furin-cleavage  site  is  not  the  product  of 
natural  evolution  and  could  have  been  inserted  into  the  SARS-CoV-2  genome  artificially  by 
techniques  other  than  simple  serial  passage  or  multi-strain  recombination  events  inside  co-infected 
tissue  cultures  or  animals. 


1.1  Genomic  sequence  analysis  reveals  that  ZC45,  or  a  closely  related  bat  coronavirus,  should  be 
the  backbone  used  for  the  creation  of  SARS-CoV-2 


The  structure  of  the  -30,000  nucleotides-long  SARS-CoV-2  genome  is  shown  in  Figure  1.  Searching 
the  NCBI  sequence  database  reveals  that,  among  all  known  coronaviruses,  there  were  two  related  bat 
coronaviruses,  ZC45  and  ZXC21,  that  share  the  highest  sequence  identity  with  SARS-CoV-2  (each  bat 
coronavirus  is  -89%  identical  to  SARS-CoV-2  on  the  nucleotide  level).  Similarity  between  the  genome 
of  SARS-CoV-2  and  those  of  representative  (1  coronaviruses  is  depicted  in  Figure  1 .  ZXC2 1 ,  which  is  97% 
identical  to  and  shares  a  very  similar  profile  with  ZC45,  is  not  shown.  Note  that  the  RaTG13  virus  is 
excluded  from  this  analysis  given  the  strong  evidence  suggesting  that  its  sequence  may  have  been 
fabricated  and  the  virus  does  not  exist  in  nature2,6'9.  (A  follow-up  report,  which  summarizes  the  up-to-date 
evidence  proving  the  spurious  nature  of  RaTG13,  will  be  submitted  soon) 


Genome  Nucleotide  Position 


3 


Figure  1.  Genomic  sequence  analysis  reveals  that  bat  coronavirus  ZC45  is  the  closest  match  to  SARS-CoV-2. 

Top:  genomic  organization  of  SARS-CoV-2  (2019-nCoV  WIV04).  Bottom:  similarity  plot  based  on  the  full-length 
genome  of  2019-nCoV  WIV04.  Full-length  genomes  of  SARS-CoV  BJ01,  bat  SARSr-CoV  W1V1,  bat  SARSr-CoV 
HKU3-1,  bat  coronavirus  ZC45  were  used  as  reference  sequences. 

When  SARS-CoV-2  and  ZC45/ZXC21  are  compared  on  the  amino  acid  level,  a  high  sequence  identity 
is  observed  for  most  of  the  proteins.  The  Nucleocapsid  protein  is  94%  identical.  The  Membrane  protein 
is  98.6%  identical.  The  S2  portion  (2nd  half)  of  the  Spike  protein  is  95%  identical.  Importantly,  the  Orf8 
protein  is  94.2%  identical  and  the  E  protein  is  100%  identical. 

Orf8  is  an  accessory  protein,  the  function  of  which  is  largely  unknown  in  most  coronaviruses,  although 
recent  data  suggests  that  Orf8  of  SARS-CoV-2  mediates  the  evasion  of  host  adaptive  immunity  by 
downregulating  MHC-I24.  Normally,  Orf8  is  poorly  conserved  in  coronaviruses25.  Sequence  blast 
indicates  that,  while  the  Orf8  proteins  of  ZC45/ZXC21  share  a  94.2%  identity  with  SARS-CoV-2  Orf8, 
no  other  coronaviruses  share  more  than  58%  identity  with  SARS-CoV-2  on  this  particular  protein.  The 

very  high  homology  here  on  the  normally  poorly  conserved  Orf8  protein  is  highly  unusual. 


A 


SRRS_GD01 

SflRS.ExoNl 

SHRS.TM-GDl 

SHRS_Sinol-ll 

Consensus 


1  10  20  30  40  50  60  70  7G 

I - ♦ - + - + - + - ♦ - + - ♦ - 1 

MYSFVSEETGTLIVNSVLLFLRFMVFLLVTLRILTRLRLCHYCCNIVNVSLVKPTVYVYSRVKNLNSSEGVPDLLV 

HYSFVSEETGTLILNSVLLFLRFVVFLLVTLRILTRLRLCRYCCNIVNVSLVKPTVYVYSRVKNLNSSEGVPDLLV 

HYSFVSEETGTLI-NSVLLFLHFVVFLLVTLRILTRLRLCRYCCNIVNVSLVKPTVYVYSRVKNLNSSEGVPOLLV 

MYSFVSEETGTLI-NSVLLFLRFVVFLLVTLfilLTRLRLCflYCCYIVNVSLVKPTVYVYSRVKNLNSSEGVPOLLV 

HYSFVSEETGTLI.NSVLLFLHFvVFLLVTLRILTRLRLCRYCCnlVNVSLVKPTVYVYSRVKNLNSSEGVPDLLV 


B 


Bat_RP040581.1 

RsSHC014 

SC2018 

Bat_NP_828854.1 

BtRs-BetaCoV/HuB2013 

BM48-31/BGR/2008 

Consensus 


1  10  20  30  40  50  60  70  7G 

I - ♦ - ♦ - + - + - ♦ - * - + - 1 

MYSFVSEETGTLIVNSVLLFLRFVVFLLVTLRILTRLRLCRYCCNIVNVSLVKPSFYIYSRVKNLNSSQGIPDLLV 
MYSFVSEETGTLIVNSVLLFLRFVVFLLVTLHILTRLRLCRYCCHIVNVSLVKPTVYVYSRVKNLNSSQGVPOLLV 
HYSFVSEETGTLIVNSVLLFLRFVVFLLVTLHILTHLRLCRYCCNIVNVSLVKPTIYVYSRVKNLNSSEGVPDLLV 
MYSFVSEETGTLIVNSVLLFLRFVVFLLVTLRILTHLRLCRYCCNIVNVSLVKPTVYVYSRVKNLNSSEGVPOLLV 
MYSFVSEETGTLIVNSVLLFVRFVVFLLVTLRILTHLRLCRYCCHIVNVSLVKPTVYVYSRVKNLNSSEGVPDLLV 
MYSFVSEETGTLIVNSVLLFLflFVVFLLVTLRILTHLRLCRYCCNIVNVSLVKPTFYVYSRVKSLNSSQEVPEFLV 
HYSFVSEETGTLIVNSVLLFlAFVVFLLVTLRILTRLRLCRYCCNIVNVSLVKPtfY ! YSRVKnLNSS#g ! P#1LV 


c 


Bat_CoV_ZC45 

Bat_CoV_ZXC21 

Feb.ll 

Rpr_17 

Rpr_15_fl 

Rpr_13 

flpr_15_B 

Consensus 


1  10  20  30  40  50  GO  70  75 

I - + - + - + - + - ♦ - ♦ - + - 1 

HYSFVSEETGTLIVNSVLLFLRFVVFLLVTLflILTRLRLCflYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV 

hYSFVSEETGTLIVNSVLLFLRFVVFLLVTLRILTHLRLCflYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV 

HYSFVSEETGTLIVNSVLLFLRFVVFLLVTLRILTRLRLCHYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV 

HYSFVSEETGTLIVNSVLLFLAFVVFLLVTLRILTVLRLCRYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV 

HYSFVSEETGTLIVNSVLLFLRFVVFLLVTLRILTRRRLCflYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV 

hYSFVSEETGTLIVNSVLLFLRFVVFLLVTLRILTRLRLCflYCCNIVNVSLVKPFFYVYSRVKNLNSSRVPDLLV 

HYSFVSEETGTLIVNSVLLFLRFVVFLLVTLRILTRLRLCRYCCNIVNVSLVKPSFYVYSRVKNLNSSRVLDLLV 

HYSFVSEETGTLIVNSVLLFLRFVVFLLVTLRILTalRLCflYCCNIVNVSLVKPsFYVYSRVKNLNSSRVpDLLV 


Figure  2.  Sequence  alignment  of  the  E  proteins  from  different  ft  coronaviruses  demonstrates  the  E  protein  ’ s 
permissiveness  and  tendency  toward  amino  acid  mutations.  A.  Mutations  have  been  observed  in  different  strains 
of  SARS-CoV.  GenBank  accession  numbers:  SARS  GD01:  AY278489.2,  SARS_ExoNl:  ACB69908.1, 
SARS  TW  GD1:  AY451881.1,  SARS  Sinol _1 1 :  AY485277.1 .  B.  Alignment  of  E  proteins  from  related  bat 
coronaviruses  indicates  its  tolerance  of  mutations  at  multiple  positions.  GenBank  accession  numbers: 
Bat_AP04 0581.1:  APO40581.1,  RsSHC014:  KC881005.1,  SC2018:  MK211374.1,  Bat_NP_828854.l : 

NPJ28854.1,  BtRs-BetaCoV/HuB2013:  AIA62312.1,  BM48-31/BGRJ2008:  YP_003858586.1.  C.  While  the  early 
copies  of  SARS-Co  V-2  share  1 00%  identity  on  the  E protein  with  ZC45  and  ZXC21,  sequencing  data  of  SARS-Co  V- 
2  from  April  2020  indicates  that  mutation  has  occurred  at  multiple  positions.  Accession  numbers  of  viruses:  Feb_ll: 
MN997409,  ZC45:  MG772933.1,  ZXC21:  MG772934,  Apr_13:  MT326139,  Apr_15_A:  MT263389,  Apr_15_B: 
MT293206,  Apr_17:  MT350246.  Alignments  were  done  using  the  MultAlin  Webserver 

(http://multalin.toulouse.  inra.fr/multalin/). 
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The  coronavirus  E  protein  is  a  structural  protein,  which  is  embedded  in  and  lines  the  interior  of  the 
membrane  envelope  of  the  virion26.  The  E  protein  is  tolerant  of  mutations  as  evidenced  in  both  SARS 
(Figure  2A)  and  related  bat  coronaviruses  (Figure  2B).  This  tolerance  to  amino  acid  mutations  of  the  E 
protein  is  further  evidenced  in  the  current  SARS-CoV-2  pandemic.  After  only  a  short  two-month  spread 
of  the  virus  since  its  outbreak  in  humans,  the  E  proteins  in  SARS-CoV-2  have  already  undergone 
mutational  changes.  Sequence  data  obtained  during  the  month  of  April  reveals  that  mutations  have 
occurred  at  four  different  locations  in  different  strains  (Figure  2C).  Consistent  with  this  finding,  sequence 
blast  analysis  indicates  that,  with  the  exception  of  SARS-CoV-2,  no  known  coronaviruses  share  100% 
amino  acid  sequence  identity  on  the  E  protein  with  ZC45/ZXC2 1  (. suspicious  coronaviruses  published 
after  the  start  of  the  current  pandemic  are  excluded1^21'31).  Although  100%  identity  on  the  E  protein  has 
been  observed  between  SARS-CoV  and  certain  SARS-related  bat  coronaviruses,  none  of  those  pairs 
simultaneously  share  over  83%  identity  on  the  Orf8  protein32.  Therefore,  the  94.2%  identity  on  the  Orf8 
protein,  100%  identity  on  the  E  protein,  and  the  overall  genomic/amino  acid-level  resemblance  between 
SARS-CoV-2  and  ZC45/ZXC21  are  highly  unusual.  Such  evidence,  when  considered  together,  is 
consistent  with  a  hypothesis  that  the  SARS-CoV-2  genome  has  an  origin  based  on  the  use  of  ZC45/ZXC2 1 
as  a  backbone  and/or  template  for  genetic  gain-of-function  modifications. 

Importantly,  ZC45  and  ZXC21  are  bat  coronaviruses  that  were  discovered  (between  July  2015  and 
February  2017),  isolated,  and  characterized  by  military  research  laboratories  in  the  Third  Military  Medical 
University  (Chongqing.  China)  and  the  Research  Institute  for  Medicine  of  Nanjing  Command  (Nanjing. 
China).  The  data  and  associated  work  were  published  in  20  1  833,34.  Clearly,  this  backbone/template,  which 
is  essential  for  the  creation  of  SARS-CoV-2,  exists  in  these  and  other  related  research  laboratories. 

What  strengthens  our  contention  further  is  the  published  RaTG13  virus18,  the  genomic  sequence  of 
which  is  reportedly  96%  identical  to  that  of  SARS-CoV-2.  While  suggesting  a  natural  origin  of  SARS- 
CoV-2,  the  RaTG13  virus  also  diverted  the  attention  of  both  the  scientific  field  and  the  general  public 
away  from  ZC45/ZXC214,18.  In  fact,  a  Chinese  BSL-3  lab  (the  Shanghai  Public  Health  Clinical  Centre), 
which  published  a  Nature  article  reporting  a  conflicting  close  phylogenetic  relationship  between  SARS- 
CoV-2  and  ZC45/ZXC21  rather  than  with  RaTG1335,  was  quickly  shut  down  for  “rectification”36.  It  is 
believed  that  the  researchers  of  that  laboratory  were  being  punished  for  having  disclosed  the  SARS-CoV- 
2 — ZC45/ZXC2 1  connection.  On  the  other  hand,  substantial  evidence  has  accumulated,  pointing  to  severe 
problems  associated  with  the  reported  sequence  of  RaTG13  as  well  as  questioning  the  actual  existence  of 
this  bat  virus  in  nature6,7,19"21.  A  very  recent  publication  also  indicated  that  the  receptor-binding  domain 
(RBD)  of  the  RaTG13’s  Spike  protein  could  not  bind  ACE2  of  two  different  types  of  horseshoe  bats  (they 
closely  relate  to  the  horseshoe  bat  R.  affinis,  RaTG13’s  alleged  natural  host)2,  implicating  the  inability  of 
RaTG13  to  infect  horseshoe  bats.  This  finding  further  substantiates  the  suspicion  that  the  reported 
sequence  of  RaTG13  could  have  been  fabricated  as  the  Spike  protein  encoded  by  this  sequence  does  not 
seem  to  carry  the  claimed  function.  The  fact  that  a  virus  has  been  fabricated  to  shift  the  attention  away 
from  ZC45/ZXC21  speaks  for  an  actual  role  of  ZC45/ZXC21  in  the  creation  of  SARS-CoV-2. 

1.2  The  receptor-binding  motif  of  SARS-CoV-2  Spike  cannot  be  born  from  nature  and  should  have 
been  created  through  genetic  engineering 

The  Spike  proteins  decorate  the  exterior  of  the  coronavirus  particles.  They  play  an  important  role  in 
infection  as  they  mediate  the  interaction  with  host  cell  receptors  and  thereby  help  detennine  the  host  range 
and  tissue  tropism  of  the  virus.  The  Spike  protein  is  split  into  two  halves  (Figure  3).  The  front  or  N- 
tenninal  half  is  named  SI,  which  is  fully  responsible  for  binding  the  host  receptor.  In  both  SARS-CoV 
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and  SARS-CoV-2  infections,  the  host  cell  receptor  is  hACE2.  Within  SI,  a  segment  of  around  70  amino 
acids  makes  direct  contacts  with  hACE2  and  is  correspondingly  named  the  receptor-binding  motif  (RBM) 
(Figure  3C).  In  SARS-CoV  and  SARS-CoV-2,  the  RBM  fully  determines  the  interaction  with  hACE2. 
The  C-terminal  half  of  the  Spike  protein  is  named  S2.  The  main  function  of  S2  includes  maintaining  trimer 
formation  and,  upon  successive  protease  cleavages  at  the  S1/S2  junction  and  a  downstream  S2’  position, 
mediating  membrane  fusion  to  enable  cellular  entry  of  the  virus. 


Figure  3.  Structure  of  the  SARS  Spike  protein  and  how  it  binds  to  the  hACE2  receptor.  Pictures  were  generated 
based  on  PDB  ID:  6acj37.  A)  Three  spike  proteins,  each  consisting  of  a  SI  half  and  a  S2  half  form  a  trimer.  B)  The 
S2  halves  (shades  of  blue)  are  responsible  for  trimer  formation,  while  the  SI  portion  (shades  of  red)  is  responsible 
for  binding  hACE2  (dark  gray).  C)  Details  of  the  binding  between  SI  and  hACE2.  The  RBM  of  SI,  which  is 
important  and  sufficient  for  binding,  is  colored  in  orange.  Residues  within  the  RBM  that  are  important  for  either 
hACE2  interaction  or  protein  folding  are  shown  as  sticks  (residue  numbers  follow  the  SARS  Spike  sequence). 
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SGHTfiGAflHYYVGYLQPRTFLLKYNENGTITOAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYflHNRKRISNCVflOYSVLYNSflSFSTFKCYGVSPT 
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LGDISGINflSVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKUPUYIULGFIAGLIAIVnVTIHLCCHTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT 

LGDISGINASVVNIQKEIORLNEVARNLNESLIOLQELGKYEQYIKHPHYVHLGFIAGLIAIVnVTILLCCnTSCCSCLKGCCSCGSCCKFOEDDSEPVLKGVKLHYT 

LGDISGINASVVNIQKEIDRLNEVflRNLNESLIDLQELGKYEHYIKUPHYVHLGFIRGLIAIVHVTILLCCHTSCCSCLKGCCSCGFCCKFDEDDSEPVLKGVKLHYT 

LGDISGINASVVNIQEEIORLNEVflKNLNESLIDLQELGKYEQYIKHPHYVHLGFIAGLIAIVrtVTILLCCHTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT 

LGDISGINASVVNIQKEIORLNEVAKNLNESLIOLQELGKYEQYIKHPHYVHLGFIAGLIAIVHVTILLCCHTSCCSCLKGACSCGSCCKFOEDDSEPVLKGVKLHYT 

LGDISGINASVVNIQkEIORLNEVAkNLNESLIDLQELGKYEqYIKUPUY!ULGFIAGLIAIVHVTI$LCCHTSCCSCLKGcCSCGsCCKFDEDDSEPVLKGVKLHYT 


Figure  4.  Sequence  alignment  of  the  spike  proteins  from  relevant  coronaviruses.  Viruses  being  compared  include 
SARS-CoV-2  (Wuhan-Hu-1:  NC_045512,  2019-nCoVJJSA-AZl :  MN997409),  bat  coronaviruses  (Bat_CoV_ZC45: 
MG772933,  Bat_CoV_ZXC21 :  MG772934),  and  SARS  coronaviruses  (SARS  GZ02:  AY390556,  SARS: 
NC_0047 18.3).  Region  marked  by  two  orange  lines  is  the  receptor-binding  motif  (RBM),  which  is  important  for 
interaction  with  the  hACE2  receptor.  Essential  residues  are  additionally  highlighted  by  red  sticks  on  top.  Region 
marked  by  two  green  lines  is  a  furin-cleavage  site  that  exists  only  in  SARS-CoV-2  but  not  in  any  other  lineage  B  (> 
coronavirus. 


7 


Similar  to  what  is  observed  for  other  viral  proteins,  S2  of  SARS-CoV-2  shares  a  high  sequence  identity 
(95%)  with  S2  of  ZC45/ZXC21.  In  stark  contrast,  between  SARS-CoV-2  and  ZC45/ZXC21,  the  SI 
protein,  which  dictates  which  host  (human  or  bat)  the  virus  can  infect,  is  much  less  conserved  with  the 
amino  acid  sequence  identity  being  only  69%. 

Figure  4  shows  the  sequence  alignment  of  the  Spike  proteins  from  six  (3  coronaviruses.  Two  are  viruses 
isolated  from  the  current  pandemic  (Wuhan-Hu-1,  2019-nCoV_USA-AZl);  two  are  the  suspected 
template  viruses  (Bat_CoV_ZC45,  Bat_CoV_ZXC21);  two  are  SARS  coronaviruses  (SARS  GZ02, 
SARS).  The  RBM  is  highlighted  in  between  two  orange  lines.  Clearly,  despite  the  high  sequence  identity 
for  the  overall  genomes,  the  RBM  of  SARS-CoV-2  differs  significantly  from  those  of  ZC45  and  ZXC21. 
Intriguingly,  the  RBM  of  SARS-CoV-2  resembles,  on  a  great  deal,  the  RBM  of  SARS  Spike.  Although 
this  is  not  an  exact  “copy  and  paste”,  careful  examination  of  the  Spike-hACE2  structures37,38  reveals  that 
all  residues  essential  for  either  hACE2  binding  or  protein  folding  (orange  sticks  in  Figure  3C  and  what  is 
highlighted  by  red  short  lines  in  Figure  4)  are  “kept”.  Most  of  these  essential  residues  are  precisely 
preserved,  including  those  involved  in  disulfide  bond  formation  (C467,  C474)  and  electrostatic 
interactions  (R444,  E452,  R453,  D454),  which  are  pivotal  for  the  structural  integrity  of  the  RBM  (Figure 
3C  and  4).  The  few  changes  within  the  group  of  essential  residues  are  almost  exclusively  hydrophobic 
“substitutions”  (I428^L,  L443^F,  F460^Y,  L472->F,  Y484->Q),  which  should  not  affect  either 
protein  folding  or  the  hACE2-interaction.  At  the  same  time,  majority  of  the  amino  acid  residues  that  are 
non-essential  have  “mutated”  (Figure  4,  RBM  residues  not  labeled  with  short  red  lines).  Judging  from  this 
sequence  analysis  alone,  we  were  convinced  early  on  that  not  only  would  the  SARS-CoV-2  Spike  protein 
bind  hACE2  but  also  the  binding  would  resemble,  precisely,  that  between  the  original  SARS  Spike  protein 
and  hACE223.  Recent  structural  work  has  confirmed  our  prediction39. 

As  elaborated  below,  the  way  that  SARS-CoV-2  RBM  resembles  SARS-CoV  RBM  and  the  overall 
sequence  conservation  pattern  between  SARS-CoV-2  and  ZC45/ZXC2 1  are  highly  unusual.  Collectively, 
this  suggests  that  portions  of  the  SARS-CoV-2  genome  have  not  been  derived  from  natural  quasi-species 
viral  particle  evolution. 

If  SARS-CoV-2  does  indeed  come  from  natural  evolution,  its  RBM  could  have  only  been  acquired  in 
one  of  the  two  possible  routes:  1)  an  ancient  recombination  event  followed  by  convergent  evolution  or  2) 
a  natural  recombination  event  that  occurred  fairly  recently. 

In  the  first  scenario,  the  ancestor  of  SARS-CoV-2,  a  ZC45/ZXC21-like  bat  coronavirus  would  have 
recombined  and  “swapped”  its  RBM  with  a  coronavirus  carrying  a  relatively  “complete”  RBM  (in 
reference  to  SARS).  This  recombination  would  result  in  a  novel  ZC45/ZXC21-like  coronavirus  with  all 
the  gaps  in  its  RBM  “filled”  (Figure  4).  Subsequently,  the  virus  would  have  to  adapt  extensively  in  its  new 
host,  where  the  ACE2  protein  is  highly  homologous  to  hACE2.  Random  mutations  across  the  genome 
would  have  to  have  occurred  to  eventually  shape  the  RBM  to  its  current  form  -  resembling  SARS-CoV 
RBM  in  a  highly  intelligent  manner.  However,  this  convergent  evolution  process  would  also  result  in  the 
accumulation  of  a  large  amount  of  mutations  in  other  parts  of  the  genome,  rendering  the  overall  sequence 
identity  relatively  low.  The  high  sequence  identity  between  SARS-CoV-2  and  ZC45/ZXC21  on  various 
proteins  (94-100%  identity)  do  not  support  this  scenario  and,  therefore,  clearly  indicates  that  SARS-CoV- 
2  carrying  such  an  RBM  cannot  come  from  a  ZC45/ZXC21-like  bat  coronavirus  through  this  convergent 
evolutionary  route. 

In  the  second  scenario,  the  ZC45/ZXC21-like  coronavirus  would  have  to  have  recently  recombined 
and  swapped  its  RBM  with  another  coronavirus  that  had  successfully  adapted  to  bind  an  animal  ACE2 
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highly  homologous  to  hACE2.  The  likelihood  of  such  an  event  depends,  in  part,  on  the  general 
requirements  of  natural  recombination:  1)  that  the  two  different  viruses  share  significant  sequence 
similarity;  2)  that  they  must  co-infect  and  be  present  in  the  same  cell  of  the  same  animal;  3)  that  the 
recombinant  virus  would  not  be  cleared  by  the  host  or  make  the  host  extinct;  4)  that  the  recombinant  virus 
eventually  would  have  to  become  stable  and  transmissible  within  the  host  species. 

In  regard  to  this  recent  recombination  scenario,  the  animal  reservoir  could  not  be  bats  because  the 
ACE2  proteins  in  bats  are  not  homologous  enough  to  hACE2  and  therefore  the  adaption  would  not  be  able 
to  yield  an  RBM  sequence  as  seen  in  SARS-CoV-2.  This  animal  reservoir  also  could  not  be  humans  as 
the  ZC45/ZXC21-like  coronavirus  would  not  be  able  to  infect  humans.  In  addition,  there  has  been  no 
evidence  of  any  SARS-CoV-2  or  SARS-CoV-2-like  virus  circulating  in  the  human  population  prior  to  late 
2019.  Intriguingly,  according  to  a  recent  bioinformatics  study,  SARS-CoV-2  was  well-adapted  for  humans 
since  the  start  of  the  outbreak1. 

Only  one  other  possibility  of  natural  evolution  remains,  which  is  that  the  ZC45/ZXC2 1  -like  virus  and 
a  coronavirus  containing  a  SARS-like  RBM  could  have  recombined  in  an  intermediate  host  where  the 
ACE2  protein  is  homologous  to  hACE2.  Several  laboratories  have  reported  that  some  of  the  Sunda 
pangolins  smuggled  into  China  from  Malaysia  carried  coronaviruses,  the  receptor-binding  domain  (RBD) 
of  which  is  almost  identical  to  that  of  SARS-CoV-227"29,31.  They  then  went  on  to  suggest  that  pangolins 
are  the  likely  intermediate  host  for  SARS-CoV-227'29,31.  However,  recent  independent  reports  have  found 
significant  flaws  in  this  data40"42.  Furthermore,  contrary  to  these  reports27'29,31,  no  coronaviruses  have  been 
detected  in  Sunda  pangolin  samples  collected  for  over  a  decade  in  Malaysia  and  Sabah  between  2009  and 
20 1943.  A  recent  study  also  showed  that  the  RBD,  which  is  shared  between  SARS-CoV-2  and  the  reported 
pangolin  coronaviruses,  binds  to  hACE2  ten  times  stronger  than  to  the  pangolin  ACE22,  further  dismissing 
pangolins  as  the  possible  intermediate  host.  Finally,  an  in  silico  study,  while  echoing  the  notion  that 
pangolins  are  not  likely  an  intermediate  host,  also  indicated  that  none  of  the  animal  ACE2  proteins 
examined  in  their  study  exhibited  more  favorable  binding  potential  to  the  SARS-CoV-2  Spike  protein  than 
hACE2  did3.  This  last  study  virtually  exempted  all  animals  from  their  suspected  roles  as  an  intennediate 
host3,  which  is  consistent  with  the  observation  that  SARS-CoV-2  was  well-adapted  for  humans  from  the 
start  of  the  outbreak1.  This  is  significant  because  these  findings  collectively  suggest  that  no  intennediate 
host  seems  to  exist  for  SARS-CoV-2,  which  at  the  very  least  diminishes  the  possibility  of  a  recombinant 
event  occurring  in  an  intermediate  host. 

Even  if  we  ignore  the  above  evidence  that  no  proper  host  exists  for  the  recombination  to  take  place  and 
instead  assume  that  such  a  host  does  exist,  it  is  still  highly  unlikely  that  such  a  recombination  event  could 
occur  in  nature. 

As  we  have  described  above,  if  natural  recombination  event  is  responsible  for  the  appearance  of  SARS- 
CoV-2,  then  the  ZC45/ZXC21-like  virus  and  a  coronavirus  containing  a  SARS-like  RBM  would  have  to 
recombine  in  the  same  cell  by  swapping  the  Sl/RBM,  which  is  a  rare  form  of  recombination.  Furthennore, 
since  SARS  has  occurred  only  once  in  human  history,  it  would  be  at  least  equally  rare  for  nature  to  produce 
a  virus  that  resembles  SARS  in  such  an  intelligent  manner  -  having  an  RBM  that  differs  from  the  SARS 
RBM  only  at  a  few  non-essential  sites  (Figure  4).  The  possibility  that  this  unique  SARS-like  coronavirus 
would  reside  in  the  same  cell  with  the  ZC45/ZXC21-like  ancestor  virus  and  the  two  viruses  would 
recombine  in  the  “RBM-swapping”  fashion  is  extremely  low.  Importantly,  this,  and  the  other 
recombination  event  described  below  in  section  1 .3  (even  more  impossible  to  occur  in  nature),  would  both 
have  to  happen  to  produce  a  Spike  as  seen  in  SARS-CoV-2. 
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While  the  above  evidence  and  analyses  together  appear  to  disapprove  a  natural  origin  of  SARS-CoV- 
2’s  RBM,  abundant  literature  shows  that  gain-of- function  research,  where  the  Spike  protein  of  a 
coronavirus  was  specifically  engineered,  has  repeatedly  led  to  the  successful  generation  of  human- 
infecting  coronaviruses  from  coronaviruses  of  non-human  origin44'47. 

Record  also  shows  that  research  laboratories,  for  example,  the  Wuhan  Institute  of  Virology  (WIV), 
have  successfully  carried  out  such  studies  working  with  US  researchers45  and  also  working  alone47.  In 
addition,  the  WIV  has  engaged  in  decades-long  coronavirus  surveillance  studies  and  therefore  owns  the 
world’s  largest  collection  of  coronaviruses.  Evidently,  the  technical  barrier  is  non-existent  for  the  WIV 
and  other  related  laboratories  to  carry  out  and  succeed  in  such  Spike/RBM  engineering  and  gain-of- 
function  research. 
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Figure  5.  Two  restriction  sites  are  present  at  either  end  of  the  RBM  of  SARS-CoV-2,  providing  convenience  for 
replacing  the  RBM  within  the  spike  gene.  A.  Nucleotide  sequence  of  the  RBM  of  SARS-Co  V-2  (Wuhan-Hu-1).  An 
EcoRI  site  is  found  at  the  5  ’-end  of  the  RBM  and  a  BstEII  site  at  the  3  ’-end.  B.  Although  these  two  restriction  sites 
do  not  exist  in  the  original  spike  gene  of  ZC45 ,  they  can  be  conveniently  introduced  given  that  the  sequence 
discrepancy  is  small  (2  nucleotides)  in  either  case.  C.  Amino  acid  sequence  alignment  with  the  RBM  region 
highlighted  (color  and  underscore).  The  RBM  highlighted  in  orange  (top)  is  what  is  defined  by  the  EcoRI  and  BstEII 
sites  in  the  SARS-CoV-2  (Wuhan-Hu-1)  spike.  The  RBM  highlighted  in  magenta  (middle)  is  the  region  swapped  by 
Dr.  Fang  Li  and  colleagues  into  a  SARS  Spike  backbone39 '.  The  RBM  highlighted  in  blue  (bottom)  is  from  the  Spike 
protein  (RBM:  424-494)  of  SARS-BJ01  (AY278488.2),  which  was  swapped  by  the  Shi  lab  into  the  Spike  proteins  of 
different  bat  coronaviruses  replacing  the  corresponding  segments47. 
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Strikingly,  consistent  with  the  RBM  engineering  theory,  we  have  identified  two  unique  restriction  sites, 
EcoRI  and  BstEII,  at  either  end  of  the  RBM  of  the  SARS-CoV-2  genome,  respectively  (Figure  5  A).  These 
two  sites,  which  are  popular  choices  of  everyday  molecular  cloning,  do  not  exist  in  the  rest  of  this  spike 
gene.  This  particular  setting  makes  it  extremely  convenient  to  swap  the  RBM  within  spike,  providing  a 
quick  way  to  test  different  RBMs  and  the  corresponding  Spike  proteins. 

Such  EcoRI  and  BstEII  sites  do  not  exist  in  the  spike  genes  of  other  P  coronaviruses,  which  strongly 
indicates  that  they  were  unnatural  and  were  specifically  introduced  into  this  spike  gene  of  SARS-CoV-2 
for  the  convenience  of  manipulating  the  critical  RBM.  Although  ZC45  spike  also  does  not  have  these  two 
sites  (Figure  5B),  they  can  be  introduced  very  easily  as  described  in  part  2  of  this  report. 

It  is  noteworthy  that  introduction  of  the  EcoRI  site  here  would  change  the  corresponding  amino  acids 
from  -WNT-  to  -WNS-  (Figure  5AB).  As  far  as  we  know,  all  SARS  and  SARS-like  bat  coronaviruses 
exclusively  carry  a  T  (threonine)  residue  at  this  location.  SARS-CoV-2  is  the  only  exception  in  that  this  T 
has  mutated  to  an  S  (serine),  save  the  suspicious  RaTG13  and  pangolin  coronaviruses  published  after  the 
outbreak48. 

Once  the  restriction  sites  were  successfully  introduced,  the  RBM  segment  could  be  swapped 
conveniently  using  routine  restriction  enzyme  digestion  and  ligation.  Although  alternative  cloning 
techniques  may  leave  no  trace  of  genetic  manipulation  (Gibson  assembly  as  one  example),  this  old- 
fashioned  approach  could  be  chosen  because  it  offers  a  great  level  of  convenience  in  swapping  this  critical 
RBM. 

Given  that  RBM  fully  dictates  hACE2-binding  and  that  the  SARS  RBM-hACE2  binding  was  fully 
characterized  by  high-resolution  structures  (Figure  3)37,38,  this  RBM-only  swap  would  not  be  any  riskier 
than  the  full  Spike  swap.  In  fact,  the  feasibility  of  this  RBM-swap  strategy  has  been  proven39’47.  In  2008, 
Dr.  Zhengli  Shi’s  group  swapped  a  SARS  RBM  into  the  Spike  proteins  of  several  SARS-like  bat 
coronaviruses  after  introducing  a  restriction  site  into  a  codon-optimized  spike  gene  (Figure  5C)47.  They 
then  validated  the  binding  of  the  resulted  chimeric  Spike  proteins  with  hACE2.  Furthennore,  in  a  recent 
publication,  the  RBM  of  SARS-CoV-2  was  swapped  into  the  receptor-binding  domain  (RBD)  of  SARS- 
CoV,  resulting  in  a  chimeric  RBD  fully  functional  in  binding  hACE2  (Figure  5C)39.  Strikingly,  in  both 
cases,  the  manipulated  RBM  segments  resemble  almost  exactly  the  RBM  defined  by  the  positions  of  the 
EcoRI  and  BstEII  sites  (Figure  5C).  Although  cloning  details  are  lacking  in  both  publications39,47,  it  is 
conceivable  that  the  actual  restriction  sites  may  vary  depending  on  the  spike  gene  receiving  the  RBM 
insertion  as  well  as  the  convenience  in  introducing  unique  restriction  site(s)  in  regions  of  interest.  It  is 
noteworthy  that  the  corresponding  author  of  this  recent  publication39,  Dr.  Fang  Li,  has  been  an  active 
collaborator  of  Dr.  Zhengli  Shi  since  20 1049"53.  Dr.  Li  was  the  first  person  in  the  world  to  have  structurally 
elucidated  the  binding  between  SARS-CoV  RBD  and  hACE238  and  has  been  the  leading  expert  in  the 
structural  understanding  of  Spike- ACE2  interactions38,39,53"56.  The  striking  finding  of  EcoRI  and  BstEII 
restriction  sites  at  either  end  of  the  SARS-CoV-2  RBM,  respectively,  and  the  fact  that  the  same  RBM 
region  has  been  swapped  both  by  Dr.  Shi  and  by  her  long-term  collaborator,  respectively,  using  restriction 
enzyme  digestion  methods  are  unlikely  a  coincidence.  Rather,  it  is  the  smoking  gun  proving  that  the 
RBM/Spike  of  SARS-CoV-2  is  a  product  of  genetic  manipulation. 

Although  it  may  be  convenient  to  copy  the  exact  sequence  of  SARS  RBM,  it  would  be  too  clear  a  sign 
of  artificial  design  and  manipulation.  The  more  deceiving  approach  would  be  to  change  a  few  non- 
essential  residues,  while  preserving  the  ones  critical  for  binding.  This  design  could  be  well-guided  by  the 
high-resolution  structures  (Figure  3)37,38.  This  way,  when  the  overall  sequence  of  the  RBM  would  appear 
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to  be  more  distinct  from  that  of  the  SARS  RBM,  the  hACE2-binding  ability  would  be  well-preserved.  We 
believe  that  all  of  the  crucial  residues  (residues  labeled  with  red  sticks  in  Figure  4,  which  are  the  same 
residues  shown  in  sticks  in  Figure  3C)  should  have  been  “kept”.  As  described  earlier,  while  some  should 
be  direct  preservation,  some  should  have  been  switched  to  residues  with  similar  properties,  which  would 
not  disrupt  hACE2-binding  and  may  even  strengthen  the  association  further.  Importantly,  changes  might 
have  been  made  intentionally  at  non-essential  sites,  making  it  less  like  a  “copy  and  paste”  of  the  SARS 
RBM. 

1.3  An  unusual  furin-cleavage  site  is  present  in  the  Spike  protein  of  SARS-CoV-2  and  is  associated 
with  the  augmented  virulence  of  the  virus 

Another  unique  motif  in  the  Spike  protein  of  SARS-CoV-2  is  a  polybasic  furin-cleavage  site  located  at 
the  S1/S2  junction  (Figure  4,  segment  in  between  two  green  lines).  Such  a  site  can  be  recognized  and 
cleaved  by  the  furin  protease.  Within  the  lineage  B  of  P  coronaviruses  and  with  the  exception  of  SARS- 
CoV-2,  no  viruses  contain  a  furin-cleavage  site  at  the  S1/S2  junction  (Figure  6)57.  In  contrast,  furin- 
cleavage  site  at  this  location  has  been  observed  in  other  groups  of  coronaviruses57’58.  Certain  selective 
pressure  seems  to  be  in  place  that  prevents  the  lineage  B  of  P  coronaviruses  from  acquiring  or  maintaining 
such  a  site  in  nature. 
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Figure  6.  Furin-cleavage  site  found  at  the  S1/S2  junction  of  Spike  is  unique  to  SARS-CoV-2  and  absent  in  other 
lineage  B  /?  coronaviruses.  Figure  reproduced  from  Hoffmann,  et  al57. 
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As  previously  described,  during  the  cell  entry  process,  the  Spike  protein  is  first  cleaved  at  the  S1/S2 
junction.  This  step,  and  a  subsequent  cleavage  downstream  that  exposes  the  fusion  peptide,  are  both 
mediated  by  host  proteases.  The  presence  or  absence  of  these  proteases  in  different  cell  types  greatly 
affects  the  cell  tropism  and  presumably  the  pathogenicity  of  the  viral  infection.  Unlike  other  proteases, 
furin  protease  is  widely  expressed  in  many  types  of  cells  and  is  present  at  multiple  cellular  and 
extracellular  locations.  Importantly,  the  introduction  of  a  furin-cleavage  site  at  the  S1/S2  junction  could 
significantly  enhance  the  infectivity  of  a  virus  as  well  as  greatly  expand  its  cell  tropism  —  a  phenomenon 
well-documented  in  both  influenza  viruses  and  other  coronaviruses59"65. 

If  we  leave  aside  the  fact  that  no  furin-cleavage  site  is  found  in  any  lineage  B  p  coronavirus  in  nature 
and  instead  assume  that  this  site  in  SARS-CoV-2  is  a  result  of  natural  evolution,  then  only  one 
evolutionary  pathway  is  possible,  which  is  that  the  furin-cleavage  site  has  to  be  derived  from  a 
homologous  recombination  event.  Specifically,  an  ancestor  p  coronavirus  containing  no  furin-cleavage 
site  would  have  to  recombine  with  a  closely  related  coronavirus  that  does  contain  a  furin-cleavage  site. 

However,  two  facts  disfavor  this  possibility.  First,  although  some  coronaviruses  from  other  groups  or 
lineages  do  contain  polybasic  furin-cleavage  sites,  none  of  them  contains  the  exact  polybasic  sequence 
present  in  SARS-CoV-2  ( -PRRAR/SVA- ).  Second,  between  SARS-CoV-2  and  any  coronavirus  containing 
a  legitimate  furin-cleavage  site,  the  sequence  identity  on  Spike  is  no  more  than  40%66.  Such  a  low  level 
of  sequence  identity  rules  out  the  possibility  of  a  successful  homologous  recombination  ever  occurring 
between  the  ancestors  of  these  viruses.  Therefore,  the  furin-cleavage  site  within  the  SARS-CoV-2  Spike 
protein  is  unlikely  to  be  of  natural  origin  and  instead  should  be  a  result  of  laboratory  modification. 

Consistent  with  this  claim,  a  close  examination  of  the  nucleotide  sequence  of  the  furin-cleavage  site  in 
SARS-CoV-2  spike  has  revealed  that  the  two  consecutive  Arg  residues  within  the  inserted  sequence  (- 
PRRA-)  are  both  coded  by  the  rare  codon  CGG  (least  used  codon  for  Arg  in  SARS-CoV-2)  (Figure  7)8. 
In  fact,  this  CGGCGG  arrangement  is  the  only  instance  found  in  the  SARS-CoV-2  genome  where  this 
rare  codon  is  used  in  tandem.  This  observation  strongly  suggests  that  this  furin-cleavage  site  should  be  a 
result  of  genetic  engineering.  Adding  to  the  suspicion,  a  Faul  restriction  site  is  formulated  by  the  codon 
choices  here,  suggesting  the  possibility  that  the  restriction  fragment  length  polymorphism,  a  technique 
that  a  WIV  lab  is  proficient  at67,  could  have  been  involved.  There,  the  fragmentation  pattern  resulted  from 
Faul  digestion  could  be  used  to  monitor  the  preservation  of  the  furin-cleavage  site  in  Spike  as  this  furin- 
cleavage  site  is  prone  to  deletions  in  vitro 68,69.  Specifically,  RT-PCR  on  the  spike  gene  of  the  recovered 
viruses  from  cell  cultures  or  laboratory  animals  could  be  carried  out,  the  product  of  which  would  be 
subjected  to  Faul  digestion.  Viruses  retaining  or  losing  the  furin-cleavage  site  would  then  yield  distinct 
patterns,  allowing  convenient  tracking  of  the  virus(es)  of  interest. 

Faul 

tat  cag  act  cag  act  aat  tct  cct  egg  egg  gca  cgt  agt  gta  get  agt  caa  tcc  ate  att 
YQTQTNS  PRRARSVASQS  I  I 

Figure  7.  Two  consecutive  Arg  residues  in  the  -PRRA-  insertion  at  the  S1/S2  junction  of  SARS-CoV-2  Spike  are 
both  coded  by  a  rare  codon,  CGG.  A  Faul  restriction  site,  5  ’-(N)6GCGGG-3  ’,  is  embedded  in  the  coding  sequence 
of  the  “inserted”  PRRA  segment,  which  may  be  used  as  a  marker  to  monitor  the  preservation  of  the  introduced 
furin-cleavage  site. 

In  addition,  although  no  known  coronaviruses  contain  the  exact  sequence  of  -PRRAR/SVA-  that  is 
present  in  the  SARS-CoV-2  Spike  protein,  a  similar  -RRAR/AR-  sequence  has  been  observed  at  the  S1/S2 
junction  of  the  Spike  protein  in  a  rodent  coronavirus,  AcCoV-JC34,  which  was  published  by  Dr.  Zhengli 
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Shi  in  2017™.  It  is  evident  that  the  legitimacy  of  -RRAR-  as  a  functional  furin-cleavage  site  has  been 
known  to  the  WIV  experts  since  2017. 

The  evidence  collectively  suggests  that  the  furin-cleavage  site  in  the  SARS-CoV-2  Spike  protein  may 
not  have  come  from  nature  and  could  be  the  result  of  genetic  manipulation.  The  purpose  of  this 
manipulation  could  have  been  to  assess  any  potential  enhancement  of  the  infectivity  and  pathogenicity  of 
the  laboratory-made  coronavirus59"64.  Indeed,  recent  studies  have  confirmed  that  the  furin-cleavage  site 
does  confer  significant  pathogenic  advantages  to  SARS-CoV-257’68. 

1.4  Summary 

Evidence  presented  in  this  part  reveals  that  certain  aspects  of  the  SARS-CoV-2  genome  are  extremely 
difficult  to  reconcile  to  being  a  result  of  natural  evolution.  The  alternative  theory  we  suggest  is  that  the 
virus  may  have  been  created  by  using  ZC45/ZXC21  bat  coronavirus(es)  as  the  backbone  and/or  template. 
The  Spike  protein,  especially  the  RBM  within  it,  should  have  been  artificially  manipulated,  upon  which 
the  virus  has  acquired  the  ability  to  bind  hACE2  and  infect  humans.  This  is  supported  by  the  finding  of  a 
unique  restriction  enzyme  digestion  site  at  either  end  of  the  RBM.  An  unusual  furin-cleavage  site  may 
have  been  introduced  and  inserted  at  the  S1/S2  junction  of  the  Spike  protein,  which  contributes  to  the 
increased  virulence  and  pathogenicity  of  the  virus.  These  transformations  have  then  staged  the  SARS- 
CoV-2  virus  to  eventually  become  a  highly-transmissible,  onset-hidden,  lethal,  sequelae-unclear,  and 
massively  disruptive  pathogen. 

Evidently,  the  possibility  that  SARS-CoV-2  could  have  been  created  through  gain-of-function 
manipulations  at  the  WIV  is  significant  and  should  be  investigated  thoroughly  and  independently. 


2.  Delineation  of  a  synthetic  route  of  SARS-CoV-2 

In  the  second  part  of  this  report,  we  describe  a  synthetic  route  of  creating  SARS-CoV-2  in  a  laboratory 
setting.  It  is  postulated  based  on  substantial  literature  support  as  well  as  genetic  evidence  present  in  the 
SARS-CoV-2  genome.  Although  steps  presented  herein  should  not  be  viewed  as  exactly  those  taken,  we 
believe  that  key  processes  should  not  be  much  different.  Importantly,  our  work  here  should  serve  as  a 
demonstration  of  how  SARS-CoV-2  can  be  designed  and  created  conveniently  in  research  laboratories  by 
following  proven  concepts  and  using  well-established  techniques. 

Importantly,  research  labs,  both  in  Hong  Kong  and  in  mainland  China,  are  leading  the  world  in 
coronavirus  research,  both  in  terms  of  resources  and  on  the  research  outputs.  The  latter  is  evidenced  not 
only  by  the  large  number  of  publications  that  they  have  produced  over  the  past  two  decades  but  also  by 
their  milestone  achievements  in  the  field:  they  were  the  first  to  identify  civets  as  the  intennediate  host  for 
SARS-CoV  and  isolated  the  first  strain  of  the  virus71;  they  were  the  first  to  uncover  that  SARS-CoV 
originated  from  bats72,73;  they  revealed  for  the  first  time  the  antibody-dependent  enhancement  (ADE)  of 
SARS-CoV  infections74;  they  have  contributed  significantly  in  understanding  MERS  in  all  domains 
(zoonosis,  virology,  and  clinical  studies)75"79;  they  made  several  breakthroughs  in  SARS-CoV-2 
research18,35’80.  Last  but  not  least,  they  have  the  world’s  largest  collection  of  coronaviruses  (genomic 
sequences  and  live  viruses).  The  knowledge,  expertise,  and  resources  are  all  readily  available  within  the 
Hong  Kong  and  mainland  research  laboratories  (they  collaborate  extensively)  to  carry  out  and  accomplish 
the  work  described  below. 
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Figure  8.  Diagram  of  a  possible  synthetic  route  of  the  laboratory-creation  of  SARS-CoV-2. 
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2.1  Possible  scheme  in  designing  the  laboratory-creation  of  the  novel  coronavirus 

In  this  sub-section,  we  outline  the  possible  overall  strategy  and  major  considerations  that  may  have 
been  formulated  at  the  designing  stage  of  the  project. 

To  engineer  and  create  a  human-targeting  coronavirus,  they  would  have  to  pick  a  bat  coronavirus  as 
the  template/backbone.  This  can  be  conveniently  done  because  many  research  labs  have  been  actively 
collecting  bat  coronaviruses  over  the  past  two  decades32’33’70,72,81'85.  However,  this  template  virus  ideally 
should  not  be  one  from  Dr.  Zhengli  Shi’s  collections,  considering  that  she  is  widely  known  to  have  been 
engaged  in  gain-of- function  studies  on  coronaviruses.  Therefore,  ZC45  and/or  ZXC21,  novel  bat 
coronaviruses  discovered  and  owned  by  military  laboratories33,  would  be  suitable  as  the 
template/backbone.  It  is  also  possible  that  these  military  laboratories  had  discovered  other  closely  related 
viruses  from  the  same  location  and  kept  some  unpublished.  Therefore,  the  actual  template  could  be  ZC45, 
or  ZXC21,  or  a  close  relative  of  them.  The  postulated  pathway  described  below  would  be  the  same 
regardless  of  which  one  of  the  three  was  the  actual  template. 

Once  they  have  chosen  a  template  virus,  they  would  first  need  to  engineer,  through  molecular  cloning, 
the  Spike  protein  so  that  it  can  bind  hACE2.  The  concept  and  cloning  techniques  involved  in  this 
manipulation  have  been  well-documented  in  the  literature44"46,84,86.  With  almost  no  risk  of  failing,  the 
template  bat  virus  could  then  be  converted  to  a  coronavirus  that  can  bind  hACE2  and  infect  humans44'46. 

Second,  they  would  use  molecular  cloning  to  introduce  a  furin-cleavage  site  at  the  S1/S2  junction  of 
Spike.  This  manipulation,  based  on  known  knowledge60,61,65,  would  likely  produce  a  strain  of  coronavirus 
that  is  a  more  infectious  and  pathogenic. 

Third,  they  would  produce  an  ORFlb  gene  construct.  The  ORFlb  gene  encodes  the  polyprotein  Orflb, 
which  is  processed  post-translationally  to  produce  individual  viral  proteins:  RNA-dependent  RNA 
polymerase  (RdRp),  helicase,  guanidine-N7  methyltransferase,  uridylate-specific  endoribonuclease,  and 
2’-0-methyltransferase.  All  of  these  proteins  are  parts  of  the  replication  machinery  of  the  virus.  Among 
them,  the  RdRp  protein  is  the  most  crucial  one  and  is  highly  conserved  among  coronaviruses.  Importantly, 
Dr.  Zhengli  Shi’s  laboratory  uses  a  PCR  protocol,  which  amplifies  a  particular  fragment  of  the  RdRp  gene, 
as  their  primary  method  to  detect  the  presence  of  coronaviruses  in  raw  samples  (bat  fecal  swap,  feces,  etc). 
As  a  result  of  this  practice,  the  Shi  group  has  documented  the  sequence  information  of  this  short  segment 
of  RdRp  for  all  coronaviruses  that  they  have  successfully  detected  and/or  collected. 

Here,  the  genetic  manipulation  is  less  demanding  or  complicated  because  Orflb  is  conserved  and  likely 
Orflb  from  any  P  coronavirus  would  be  competent  enough  to  do  the  work.  However,  we  believe  that  they 
would  want  to  introduce  a  particular  Orflb  into  the  virus  for  one  of  the  two  possible  reasons: 

1 .  Since  many  phylogenetic  analyses  categorize  coronaviruses  based  on  the  sequence  similarity  of 
the  RdRp  gene  only18,31,35,83,87,  having  a  different  RdRp  in  the  genome  therefore  could  ensure  that 
SARS-CoV-2  and  ZC45/ZXC21  are  separated  into  different  groups/sub-lineages  in  phylogenetic 
studies.  Choosing  an  RdRp  gene,  however,  is  convenient  because  the  short  RdRp  segment  sequence 
has  been  recorded  for  all  coronaviruses  ever  collected/detected.  Their  final  choice  was  the  RdRp 
sequence  from  bat  coronavirus  RaBtCoV/4991,  which  was  discovered  in  2013.  For 
RaBtCoV/4991,  the  only  infonnation  ever  published  was  the  sequence  of  its  short  RdRp  segment83, 
while  neither  its  full  genomic  sequence  nor  virus  isolation  were  ever  reported.  After  amplifying 
the  RdRp  segment  (or  the  whole  ORFlb  gene)  of  RaBatCoV/4991,  they  would  have  then  used  it 
for  subsequent  assembly  and  creation  of  the  genome  of  SARS-CoV-2.  Small  changes  in  the  RdRp 
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sequence  could  either  be  introduced  at  the  beginning  (through  DNA  synthesis)  or  be  generated  via 
passages  later  on.  On  a  separate  track,  when  they  were  engaged  in  the  fabrication  of  the  RaTG13 
sequence,  they  could  have  started  with  the  short  RdRp  segment  of  RaBtCoV/4991  without 
introducing  any  changes  to  its  sequence,  resulting  in  a  100%  nucleotide  sequence  identity  between 
the  two  viruses  on  this  short  RdRp  segment83.  This  RaTG13  virus  could  then  be  claimed  to  have 
been  discovered  back  in  2013. 

2.  The  RdRp  protein  from  RaBatCoV/4991  is  unique  in  that  it  is  superior  than  RdRp  from  any  other 
P  coronavirus  for  developing  antiviral  drugs.  RdRp  has  no  homologs  in  human  cells,  which  makes 
this  essential  viral  enzyme  a  highly  desirable  target  for  antiviral  development.  As  an  example, 
Remedesivir,  which  is  currently  undergoing  clinical  trials,  targets  RdRp.  When  creating  a  novel 
and  human-targeting  virus,  they  would  be  interested  in  developing  the  antidote  as  well.  Even 
though  drug  discovery  like  this  may  not  be  easily  achieved,  it  is  reasonable  for  them  to 
intentionally  incorporate  a  RdRp  that  is  more  amenable  for  antiviral  drug  development. 

Fourth,  they  would  use  reverse  genetics  to  assemble  the  gene  fragments  of  spike,  ORFlb,  and  the  rest 
of  the  template  ZC45  into  a  cDNA  version  of  the  viral  genome.  They  would  then  carry  out  in  vitro 
transcription  to  obtain  the  viral  RNA  genome.  Transfection  of  the  RNA  genome  into  cells  would  allow 
the  recovery  of  live  and  infectious  viruses  with  the  desired  artificial  genome. 

Fifth,  they  would  carry  out  characterization  and  optimization  of  the  virus  strain) s)  to  improve  the  fitness, 
infectivity,  and  overall  adaptation  using  serial  passage  in  vivo.  One  or  several  viral  strains  that  meet  certain 
criteria  would  then  be  obtained  as  the  final  product(s). 

2.2  A  postulated  synthetic  route  for  the  creation  of  SARS-CoV-2 

In  this  sub-section,  we  describe  in  more  details  how  each  step  could  be  carried  out  in  a  laboratory 
setting  using  available  materials  and  routine  molecular,  cellular,  and  viro logic  techniques.  A  diagram  of 
this  process  is  shown  in  Figure  8.  We  estimate  that  the  whole  process  could  be  completed  in  approximately 
6  months. 

Step  1 :  Engineering  the  RBM  of  the  Spike  for  hACE2-binding  ( 1 .5  months) 

The  Spike  protein  of  a  bat  coronavirus  is  either  incapable  of  or  inefficient  in  binding  hACE2  due  to  the 
missing  of  important  residues  within  its  RBM.  This  can  be  exemplified  by  the  RBM  of  the  template  virus 
ZC45  (Figure  4).  The  first  and  most  critical  step  in  the  creation  of  SARS-CoV-2  is  to  engineer  the  Spike 
so  that  it  acquires  the  ability  to  bind  hACE2.  As  evidenced  in  the  literature,  such  manipulations  have  been 
carried  out  repeatedly  in  research  laboratories  since  200844,  which  successfully  yielded  engineered 
coronaviruses  with  the  ability  to  infect  human  cells44'46,88,89.  Although  there  are  many  possible  ways  that 
one  can  engineer  the  Spike  protein,  we  believe  that  what  was  actually  undertaken  was  that  they  replaced 
the  original  RBM  with  a  designed  and  possibly  optimized  RBM  using  SARS’  RBM  as  a  guide.  As 
described  in  part  1 ,  this  theory  is  supported  by  our  observation  that  two  unique  restriction  sites,  EcoRI  and 
BstEII,  exist  at  either  end  of  the  RBM  in  the  SARS-CoV-2  genome  (Figure  5A)  and  by  the  fact  that  such 
RBM-swap  has  been  successfully  carried  out  by  Dr.  Zhengli  Shi  and  by  her  long-term  collaborator  and 
structure  biology  expert,  Dr.  Fang  Fi39,47. 

Although  ZC45  spike  does  not  contain  these  two  restriction  sites  (Figure  5B),  they  can  be  introduced 
very  easily.  The  original  spike  gene  would  be  either  amplified  with  RT-PCR  or  obtained  through  DNA 
synthesis  (some  changes  could  be  safely  introduced  to  certain  variable  regions  of  the  sequence)  followed 
by  PCR.  The  gene  would  then  be  cloned  into  a  plasmid  using  restriction  sites  other  than  EcoRI  and  BstEII. 
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Once  in  the  plasmid,  the  spike  gene  can  be  modified  easily.  First,  an  EcoRI  site  can  be  introduced  by 
converting  the  highlighted  “gaacac”  sequence  (Figure  5B)  to  the  desired  “gaattc”  (Figure  5A).  The 
difference  between  them  are  two  consecutive  nucleotides.  Using  the  commercially  available  QuikChange 
Site-Directed  Mutagenesis  kit,  such  a  di-nucleotide  mutation  can  be  generated  in  no  more  than  one  week. 
Subsequently,  the  BstEII  site  could  be  similarly  introduced  at  the  other  end  of  the  RBM.  Specifically,  the 
“gaatacc”  sequence  (Figure  5B)  would  be  converted  to  the  desired  “ggttacc”  (Figure  5A),  which  would 
similarly  require  a  week  of  time. 

Once  these  restriction  sites,  which  are  unique  within  the  spike  gene  of  SARS-CoV-2,  were  successfully 
introduced,  different  RBM  segments  could  be  swapped  in  conveniently  and  the  resulting  Spike  protein 
subsequently  evaluated  using  established  assays. 

As  described  in  part  1,  the  design  of  an  RBM  segment  could  be  well-guided  by  the  high-resolution 
structures  (Figure  3)37-38,  yielding  a  sequence  that  resembles  the  SARS  RBM  in  an  intelligent  manner. 
When  carrying  out  the  structure-guided  design  of  the  RBM,  they  would  have  followed  the  routine  and 
generated  a  few  (for  example  a  dozen)  such  RBMs  with  the  hope  that  some  specific  variant(s)  may  be 
superior  than  others  in  binding  hACE2.  Once  the  design  was  finished,  they  could  have  each  of  the  designed 
RBM  genes  commercially  synthesized  (quick  and  very  affordable)  with  an  EcoRI  site  at  the  5’ -end  and  a 
BstEII  site  at  the  3 ’-end.  These  novel  RBM  genes  could  then  be  cloned  into  the  spike  gene,  respectively. 
The  gene  synthesis  and  subsequent  cloning,  which  could  be  done  in  a  batch  mode  for  the  small  library  of 
designed  RBMs,  would  take  approximately  one  month. 

These  engineered  Spike  proteins  might  then  be  tested  for  hACE2-binding  using  the  established 
pseudotype  virus  infection  assays45’49,50.  The  engineered  Spike  with  good  to  exceptional  binding  affinities 
would  be  selected.  (Although  not  necessary,  directed  evolution  could  be  involved  here  (error-prone  PCR 
on  the  RBM  gene),  coupled  with  either  an  in  vitro  binding  assay39,90  or  a  pseudotype  virus  infection 
assay45,49,50,  to  obtain  an  RBM  that  binds  hACE2  with  exceptional  affinity.) 

Given  the  abundance  of  literature  on  Spike  engineering44"46,84,86  and  the  available  high-resolution 
structures  of  the  Spike -hACE2  complex37,38,  the  success  of  this  step  would  be  very  much  guaranteed.  By 
the  end  of  this  step,  as  desired,  a  novel  spike  gene  would  be  obtained,  which  encodes  a  novel  Spike  protein 
capable  of  binding  hACE2  with  high  affinity. 

Step  2:  Engineering  a  furin-cleavage  site  at  the  S1/S2  junction  (0.5  month) 

The  product  from  Step  1,  a  plasmid  containing  the  engineered  spike,  would  be  further  modified  to 
include  a  furin-cleavage  site  (segment  indicated  by  green  lines  in  Figure  4)  at  the  S1/S2  junction.  This 
short  stretch  of  gene  sequence  can  be  conveniently  inserted  using  several  routine  cloning  techniques, 
including  QuikChange  Site-Directed  PCR60,  overlap  PCR  followed  by  restriction  enzyme  digestion  and 
ligation91,  or  Gibson  assembly.  None  of  these  techniques  would  leave  any  trace  in  the  sequence. 
Whichever  cloning  method  was  the  choice,  the  inserted  gene  piece  would  be  included  in  the  primers, 
which  would  be  designed,  synthesized,  and  used  in  the  cloning.  This  step,  leading  to  a  further  modified 
Spike  with  the  furin-cleavage  site  added  at  the  S1/S2  junction,  could  be  completed  in  no  more  than  two 
weeks. 

Step  3:  Obtain  an  ORFlb  gene  that  contains  the  sequence  of  the  short  RdRp  segment  from  RaBtCoV/4991 
( 1  month,  vet  can  be  carried  out  concurrently  with  Steps  1  and  2) 
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Unlike  the  engineering  of  Spike,  no  complicated  design  is  needed  here,  except  that  the  RdRp  gene 
segment  from  RaBtCoV/4991  would  need  to  be  included.  Gibson  assembly  could  have  been  used  here.  In 
this  technique,  several  fragments,  each  adjacent  pair  sharing  20-40  bp  overlap,  are  combined  together  in 
one  simple  reaction  to  assemble  a  long  DNA  product.  Two  or  three  fragments,  each  covering  a  significant 
section  of  the  ORFlb  gene,  would  be  selected  based  on  known  bat  coronavirus  sequences.  One  of  these 
fragments  would  be  the  RdRp  segment  of  RaBtCoV/499183.  Each  fragment  would  be  PCR  amplified  with 
proper  overlap  regions  introduced  in  the  primers.  Finally,  all  purified  fragments  would  be  pooled  in 
equimolar  concentrations  and  added  to  the  Gibson  reaction  mixture,  which,  after  a  short  incubation,  would 
yield  the  desired  ORFlb  gene  in  whole. 

Step  4:  Produce  the  designed  viral  genome  using  reverse  genetics  and  recover  live  viruses  (0.5  month) 

Reverse  genetics  have  been  frequently  used  in  assembling  whole  viral  genomes,  including  coronavirus 
genomes67,92"96.  The  most  recent  example  is  the  reconstruction  of  the  SARS-CoV-2  genome  using  the 
transformation-assisted  recombination  in  yeast91 .  Using  this  method,  the  Swiss  group  assembled  the  entire 
viral  genome  and  produced  live  viruses  in  just  one  week97.  This  efficient  technique,  which  would  not  leave 
any  trace  of  artificial  manipulation  in  the  created  viral  genome,  has  been  available  since  20  1  798,99.  In 
addition  to  the  engineered  spike  gene  (from  steps  1  and  2)  and  the  ORFlb  gene  (from  step  3),  other 
fragments  covering  the  rest  of  the  genome  would  be  obtained  either  through  RT-PCR  amplification  from 
the  template  virus  or  through  DNA  synthesis  by  following  a  sequence  slightly  altered  from  that  of  the 
template  virus.  We  believe  that  the  latter  approach  was  more  likely  as  it  would  allow  sequence  changes 
introduced  into  the  variable  regions  of  less  conserved  proteins,  the  process  of  which  could  be  easily  guided 
by  multiple  sequence  alignments.  The  amino  acid  sequences  of  more  conserved  functions,  such  as  that  of 
the  E  protein,  might  have  been  left  unchanged.  All  DNA  fragments  would  then  be  pooled  together  and 
transformed  into  yeast,  where  the  cDNA  version  of  the  SARS-CoV-2  genome  would  be  assembled  via 
transformation-assisted  recombination.  Of  course,  an  alternative  method  of  reverse  genetics,  one  of  which 
the  WIV  has  successfully  used  in  the  past67,  could  also  be  employed67,92'96,100.  Although  some  earlier 
reverse  genetics  approaches  may  leave  restriction  sites  at  where  different  fragments  would  be  joined,  these 
traces  would  be  hard  to  detect  as  the  exact  site  of  ligation  can  be  anywhere  in  the  ~30kb  genome.  Either 
way,  a  cDNA  version  of  the  viral  genome  would  be  obtained  from  the  reverse  genetics  experiment. 
Subsequently,  in  vitro  transcription  using  the  cDNA  as  the  template  would  yield  the  viral  RNA  genome, 
which  upon  transfection  into  Vero  E6  cells  would  allow  the  production  of  live  viruses  bearing  all  of  the 
designed  properties. 

Step  5:  Optimize  the  virus  for  fitness  and  improve  its  hACE2-binding  affinity  in  vivo  (2,5-3  months) 

Virus  recovered  from  step  4  needs  to  be  further  adapted  undergoing  the  classic  experiment  -  serial 
passage  in  laboratory  animals101.  This  final  step  would  validate  the  virus’  fitness  and  ensure  its  receptor- 
oriented  adaptation  toward  its  intended  host,  which,  according  to  the  analyses  above,  should  be  human. 
Importantly,  the  RBM  and  the  furin-cleavage  site,  which  were  introduced  into  the  Spike  protein  separately, 
would  now  be  optimized  together  as  one  functional  unit.  Among  various  available  animal  models  (e.g. 
mice,  hamsters,  ferrets,  and  monkeys)  for  coronaviruses,  hACE2  transgenic  mice  (hACE2-mice)  should 
be  the  most  proper  and  convenient  choice  here.  This  animal  model  has  been  established  during  the  study 
of  SARS-CoV  and  has  been  available  in  the  Jackson  Laboratory  for  many  years102"104. 

The  procedure  of  serial  passage  is  straightforward.  Briefly,  the  selected  viral  strain  from  step  4,  a 
precursor  of  SARS-CoV-2,  would  be  intranasally  inoculated  into  a  group  of  anaesthetized  hACE2-mice. 
Around  2-3  days  post  infection,  the  virus  in  lungs  would  usually  amplify  to  a  peak  titer.  The  mice  would 
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then  be  sacrificed  and  the  lungs  homogenized.  Usually,  the  mouse-lung  supernatant,  which  carries  the 
highest  viral  load,  would  be  used  to  extract  the  candidate  virus  for  the  next  round  of  passage.  After 
approximately  10-15  rounds  of  passage,  the  hACE2-binding  affinity,  the  infection  efficiency,  and  the 
lethality  of  the  viral  strain  would  be  sufficiently  enhanced  and  the  viral  genome  stabilized101.  Finally,  after 
a  series  of  characterization  experiments  (e.g.  viral  kinetics  assay,  antibodies  response  assay,  symptom 
observation  and  pathology  examination),  the  final  product,  SARS-CoV-2,  would  be  obtained,  concluding 
the  whole  creation  process.  From  this  point  on,  this  viral  pathogen  could  be  amplified  (most  probably 
using  Vero  E6  cells)  and  produced  routinely. 

It  is  noteworthy  that,  based  on  the  work  done  on  SARS-CoV,  the  hACE2-mice,  although  suitable  for 
SARS-CoV-2  adaptation,  is  not  a  good  model  to  reflect  the  virus’  transmissibility  and  associated  clinical 
symptoms  in  humans.  We  believe  that  those  scientists  might  not  have  used  a  proper  animal  model  (such 
as  the  golden  Syrian  hamster)  for  testing  the  transmissibility  of  SARS-CoV-2  before  the  outbreak  of 
COVID-19.  If  they  had  done  this  experiment  with  a  proper  animal  model,  the  highly  contagious  nature  of 
SARS-CoV-2  would  be  extremely  evident  and  consequently  SARS-CoV-2  would  not  have  been  described 
as  “not  causing  human-to-human  transmission”  at  the  start  of  the  outbreak. 

We  also  speculate  that  the  extensive  laboratory-adaptation,  which  is  oriented  toward  enhanced 
transmissibility  and  lethality,  may  have  driven  the  virus  too  far.  As  a  result,  SARS-CoV-2  might  have  lost 
the  capacity  to  attenuate  on  both  transmissibility  and  lethality  during  its  current  adaptation  in  the  human 
population.  This  hypothesis  is  consistent  with  the  lack  of  apparent  attenuation  of  SARS-CoV-2  so  far 
despite  its  great  prevalence  and  with  the  observation  that  a  recently  emerged,  predominant  variant  only 
shows  improved  transmissibility105'108. 

Serial  passage  is  a  quick  and  intensive  process,  where  the  adaptation  of  the  virus  is  accelerated. 
Although  intended  to  mimic  natural  evolution,  serial  passage  is  much  more  limited  in  both  time  and  scale. 
As  a  result,  less  random  mutations  would  be  expected  in  serial  passage  than  in  natural  evolution.  This  is 
particularly  true  for  conserved  viral  proteins,  such  as  the  E  protein.  Critical  in  viral  replication,  the  E 
protein  is  a  detenninant  of  virulence  and  engineering  of  it  may  render  SARS-CoV-2  attenuated109"111 
Therefore,  at  the  initial  assembly  stage,  these  scientists  might  have  decided  to  keep  the  amino  acid 
sequence  of  the  E  protein  unchanged  from  that  of  ZC45/ZXC2 1 .  Due  to  the  conserved  nature  of  the  E 
protein  and  the  limitations  of  serial  passage,  no  amino  acid  mutation  actually  occurred,  resulting  in  a  100% 
sequence  identity  on  the  E  protein  between  SARS-CoV-2  and  ZC45/ZXC21.  The  same  could  have 
happened  to  the  marks  of  molecular  cloning  (restriction  sites  flanking  the  RBM).  Serial  passage,  which 
should  have  partially  naturalized  the  SARS-CoV-2  genome,  might  not  have  removed  all  signs  of  artificial 
manipulation. 


3.  Final  remarks 

Many  questions  remain  unanswered  about  the  origin  of  SARS-CoV-2.  Prominent  virologists  have 
implicated  in  a  Nature  Medicine  letter  that  laboratory  escape,  while  not  being  entirely  ruled  out,  was 
unlikely  and  that  no  sign  of  genetic  manipulation  is  present  in  the  SARS-CoV-2  genome4.  However,  here 
we  show  that  genetic  evidence  within  the  spike  gene  of  SARS-CoV-2  genome  (restriction  sites  flanking 
the  RBM;  tandem  rare  codons  used  at  the  inserted  furin-cleavage  site)  does  exist  and  suggests  that  the 
SARS-CoV-2  genome  should  be  a  product  of  genetic  manipulation.  Furthermore,  the  proven  concepts, 
well-established  techniques,  and  knowledge  and  expertise  are  all  in  place  for  the  convenient  creation  of 
this  novel  coronavirus  in  a  short  period  of  time. 
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Motives  aside,  the  following  facts  about  SARS-CoV-2  are  well-supported: 

1 .  If  it  was  a  laboratory  product,  the  most  critical  element  in  its  creation,  the  backbone/template  virus 
(ZC45/ZXC21),  is  owned  by  military  research  laboratories. 

2.  The  genome  sequence  of  SARS-CoV-2  has  likely  undergone  genetic  engineering,  through  which 
the  virus  has  gained  the  ability  to  target  humans  with  enhanced  virulence  and  infectivity. 

3.  The  characteristics  and  pathogenic  effects  of  SARS-CoV-2  are  unprecedented.  The  virus  is  highly 
transmissible,  onset-hidden,  multi-organ  targeting,  sequelae-unclear,  lethal,  and  associated  with 
various  symptoms  and  complications. 

4.  SARS-CoV-2  caused  a  world-wide  pandemic,  taking  hundreds  of  thousands  of  lives  and  shutting 
down  the  global  economy.  It  has  a  destructive  power  like  no  other. 

Judging  from  the  evidence  that  we  and  others  have  gathered,  we  believe  that  finding  the  origin  of 
SARS-CoV-2  should  involve  an  independent  audit  of  the  WIV  P4  laboratories  and  the  laboratories  of  their 
close  collaborators.  Such  an  investigation  should  have  taken  place  long  ago  and  should  not  be  delayed  any 
further. 

We  also  note  that  in  the  publication  of  the  chimeric  virus  SHC015-MA15  in  2015,  the  attribution  of 
funding  of  Zhengli  Shi  by  the  NIAID  was  initially  left  out.  It  was  reinstated  in  the  publication  in  2016  in 
a  corrigendum,  perhaps  after  the  meeting  in  January  2016  to  reinstate  NIH  funding  for  gain-of- function 
research  on  viruses.  This  is  an  unusual  scientific  behavior,  which  needs  an  explanation  for. 

What  is  not  thoroughly  described  in  this  report  is  the  various  evidence  indicating  that  several 
coronaviruses  recently  published  (RaTG1318,  RmYN0230,  and  several  pangolin  coronaviruses27'29,31)  are 
highly  suspicious  and  likely  fraudulent.  These  fabrications  would  serve  no  purpose  other  than  to  deceive 
the  scientific  community  and  the  general  public  so  that  the  true  identity  of  SARS-CoV-2  is  hidden. 
Although  exclusion  of  details  of  such  evidence  does  not  alter  the  conclusion  of  the  current  report,  we  do 
believe  that  these  details  would  provide  additional  support  for  our  contention  that  SARS-CoV-2  is  a 
laboratory-enhanced  virus  and  a  product  of  gain-of- function  research.  A  follow-up  report  focusing  on  such 
additional  evidence  is  now  being  prepared  and  will  be  submitted  shortly. 
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